AITopics | performance prediction

Collaborating Authors

performance prediction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Zero-Shot Performance Prediction for Probabilistic Scaling Laws

Neural Information Processing SystemsJun-14-2026, 11:51:03 GMT

The prediction of learning curves for Natural Language Processing (NLP) models enables informed decision-making to meet specific performance objectives, while reducing computational overhead and lowering the costs associated with dataset acquisition and curation. In this work, we formulate the prediction task as a multitask learning problem, where each task's data is modelled as being organized within a two-layer hierarchy. To model the shared information and dependencies across tasks and hierarchical levels, we employ latent variable multi-output Gaussian Processes, enabling to account for task correlations and supporting zero-shot prediction of learning curves (LCs). We demonstrate that this approach facilitates the development of probabilistic scaling laws at lower costs. Applying an active learning strategy, LCs can be queried to reduce predictive uncertainty and provide predictions close to ground truth scaling laws.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Asia (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
Instructional Material (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Education (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Scalable Early Childhood Reading Performance Prediction

Neural Information Processing SystemsFeb-16-2026, 08:39:55 GMT

Currently, students are identified as needing additional educational support using a'wait-to-fail' approach, i.e., waiting until a child has not made expected gains in reading before there is a reevaluation of their instructional needs.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon (0.04)
North America > United States > Massachusetts (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material (0.93)

Industry:

Education > Educational Technology > Educational Software (0.68)
Education > Educational Setting > K-12 Education > Primary School (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LOVM: Language-Only Vision Model Selection

Neural Information Processing SystemsFeb-13-2026, 05:38:57 GMT

Pre-trained multi-modal vision-language models (VLMs) are becoming increasingly popular due to their exceptional performance on downstream vision applications, particularly in the few-and zero-shot settings.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Europe > Switzerland (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Look Before you Leap: Estimating LLM Benchmark Scores from Descriptions

Park, Jungsoo, Mendes, Ethan, Stanovsky, Gabriel, Ritter, Alan

arXiv.org Artificial IntelligenceDec-3-2025

Progress in large language models is constrained by an evaluation bottleneck: build a benchmark, run models, then iterate. We ask a question: can we forecast outcomes before running any experiments to inform earlier study design? For example, a team building an AI assistant for a certain task can estimate whether expected performance is around 50 or closer to 80, evidence that supports whether to proceed to a pilot study, how to scope it, and how to allocate resources. We study text-only performance forecasting, where a model predicts a score from a redacted task description and intended configuration, with no access to dataset instances. To support systematic study, we curate PRECOG, a corpus of redacted description-performance pairs spanning diverse tasks, domains, and metrics. We scrape task and configuration descriptions from arXiv, yielding 2,290 instances covering 1,519 papers, and construct a leakage free test split using papers published after the knowledge cutoff of the evaluated models. Experiments show the task is challenging but feasible: reasoning models achieve moderate prediction performance with well calibrated uncertainty, reaching mean absolute error as low as 9.9 at high confidence thresholds. We further test a zero-leakage setting, forecasting on newly released datasets or experiments before their papers are indexed, where GPT5 with built in web search still attains nontrivial prediction accuracy. Overall, our corpus and analyses offer an initial step toward open ended anticipatory evaluation, supporting difficulty estimation and smarter experiment prioritization.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.20645

Genre: Research Report > Experimental Study (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata

Tertulino, Rodrigo, Almeida, Ricardo

arXiv.org Artificial IntelligenceNov-18-2025

Identifying the determinants of academic success in basic education represents a central challenge for educational research and policymaking, particularly in a country with Brazil's vast dimensions and socioeconomic heterogeneity (Issah et al. 2023). A systemic approach is crucial, as student performance is influenced by a complex interplay of factors spanning individual, academic, socioeconomic, and institutional domains (Barrag an Moreno and Guzm an Rinc on 2025). The System of Assessment of Basic Education (SAEB), conducted by the National Institute for Educational Studies and Research An ısio Teixeira (INEP) (INEP 2025), provides a rich, multi-level dataset uniquely suited for such an analysis (Bonamino et al. 2010). The public availability of its anonymized microdata enables the research community to investigate the intricate relationships between student proficiency and a wide array of contextual factors, from socioeconomic backgrounds to school infrastructure and teacher profiles. Consequently, the SAEB microdata is an essential resource for data-driven research aimed at informing and evaluating educational policies in the country (Lundberg and Lee 2017b; Mazoni and Oliveira 2023). While traditional statistical methods are common, the Educational Data Mining (EDM) paradigm offers powerful tools for uncovering complex, non-linear patterns from such data (Romero and Ventura 2010). Furthermore, we demonstrate that by interpreting the model's classification results with XAI techniques, our method provides data-driven insights for educators and policymakers (Idrizi 2024). The primary objective of this research is thus to develop and evaluate a multi-level machine learning model to identify the key systemic factors associated with the academic performance of 9th-grade and high school students, using the SAEB microdata. Building upon this perspective, the study shifts its analytical focus from purely individual student interventions toward addressing the systemic determinants that shape educational outcomes in Brazilian basic education.

artificial intelligence, machine learning, student, (17 more...)

arXiv.org Artificial Intelligence

2510.22266

Country:

North America > United States (0.93)
South America (0.67)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Education > Assessment & Standards > Student Performance (1.00)
Education > Educational Setting > Higher Education (0.69)
Education > Curriculum > Subject-Specific Education (0.67)
Education > Educational Setting > K-12 Education > Secondary School (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Representing LLMs in Prompt Semantic Task Space

Kashani, Idan, Mendelson, Avi, Nemcovsky, Yaniv

arXiv.org Artificial IntelligenceNov-13-2025

Large language models (LLMs) achieve impressive results over various tasks, and ever-expanding public repositories contain an abundance of pre-trained models. Therefore, identifying the best-performing LLM for a given task is a significant challenge. Previous works have suggested learning LLM representations to address this. However, these approaches present limited scalability and require costly retraining to encompass additional models and datasets. Moreover, the produced representation utilizes distinct spaces that cannot be easily interpreted. This work presents an efficient, training-free approach to representing LLMs as linear operators within the prompts' semantic task space, thus providing a highly interpretable representation of the models' application. Our method utilizes closed-form computation of geometrical properties and ensures exceptional scalability and real-time adaptability to dynamically expanding repositories. We demonstrate our approach on success prediction and model selection tasks, achieving competitive or state-of-the-art results with notable performance in out-of-sample scenarios.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.findings-emnlp.456

2509.22506

Country:

Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (1.00)
Overview (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

ZeroSim: Zero-Shot Analog Circuit Evaluation with Unified Transformer Embeddings

Yang, Xiaomeng, Gao, Jian, Wang, Yanzhi, Zhang, Xuan

arXiv.org Artificial IntelligenceNov-12-2025

Although recent advancements in learning-based analog circuit design automation have tackled tasks such as topology generation, device sizing, and layout synthesis, efficient performance evaluation remains a major bottleneck. Traditional SPICE simulations are time-consuming, while existing machine learning methods often require topology-specific retraining or manual substructure segmentation for fine-tuning, hindering scalability and adaptability. In this work, we propose ZeroSim, a transformer-based performance modeling framework designed to achieve robust in-distribution generalization across trained topologies under novel parameter configurations and zero-shot generalization to unseen topologies without any fine-tuning. We apply three key enabling strategies: (1) a diverse training corpus of 3.6 million instances covering over 60 amplifier topologies, (2) unified topology embeddings leveraging global-aware tokens and hierarchical attention to robustly generalize to novel circuits, and (3) a topology-conditioned parameter mapping approach that maintains consistent structural representations independent of parameter variations. Our experimental results demonstrate that ZeroSim significantly outperforms baseline models such as multilayer perceptrons, graph neural networks and transformers, delivering accurate zero-shot predictions across different amplifier topologies. Additionally, when integrated into a reinforcement learning-based parameter optimization pipeline, ZeroSim achieves a remarkable speedup (13x) compared to conventional SPICE simulations, underscoring its practical value for a wide range of analog circuit design automation tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.07658

Genre: Research Report (0.70)

Industry: Semiconductors & Electronics (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction

Yu, Han, Li, Kehan, Li, Dongbai, He, Yue, Zhang, Xingxuan, Cui, Peng

arXiv.org Artificial IntelligenceNov-3-2025

Recently, there has been gradually more attention paid to Out-of-Distribution (OOD) performance prediction, whose goal is to predict the performance of trained models on unlabeled OOD test datasets, so that we could better leverage and deploy off-the-shelf trained models in risk-sensitive scenarios. Although progress has been made in this area, evaluation protocols in previous literature are inconsistent, and most works cover only a limited number of real-world OOD datasets and types of distribution shifts. T o provide convenient and fair comparisons for various algorithms, we propose Out-of-Distribution Performance Prediction Benchmark (ODP-Bench), a comprehensive benchmark that includes most commonly used OOD datasets and existing practical performance prediction algorithms. W e provide our trained models as a testbench for future researchers, thus guaranteeing the consistency of comparison and avoiding the burden of repeating the model training process. Furthermore, we also conduct in-depth experimental analyses to better understand their capability boundary.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.27263

Genre: Research Report (0.83)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

SimBA: Simplifying Benchmark Analysis Using Performance Matrices Alone

Subramani, Nishant, Gomez, Alfredo, Diab, Mona

arXiv.org Artificial IntelligenceOct-22-2025

Modern language models are evaluated on large benchmarks, which are difficult to make sense of, especially for model selection. Looking at the raw evaluation numbers themselves using a model-centric lens, we propose SimBA, a three phase framework to Simplify Benchmark Analysis. The three phases of SimBA are: stalk, where we conduct dataset & model comparisons, prowl, where we discover a representative subset, and pounce, where we use the representative subset to predict performance on a held-out set of models. Applying SimBA to three popular LM benchmarks: HELM, MMLU, and BigBenchLite reveals that across all three benchmarks, datasets and models relate strongly to one another (stalk). We develop an representative set discovery algorithm which covers a benchmark using raw evaluation scores alone. Using our algorithm, we find that with 6.25% (1/16), 1.7% (1/58), and 28.4% (21/74) of the datasets for HELM, MMLU, and BigBenchLite respectively, we achieve coverage levels of at least 95% (prowl). Additionally, using just these representative subsets, we can both preserve model ranks and predict performance on a held-out set of models with near zero mean-squared error (pounce). Taken together, SimBA can help model developers improve efficiency during model training and dataset creators validate whether their newly created dataset differs from existing datasets in a benchmark. Our code is open source, available at https://github.com/nishantsubramani/simba.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.17998

Genre: Research Report (1.00)

Technology: